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X 2 '-confidence sets in high-dimensional regression 


Sara van de Geer, Benjamin Stucky 


Abstract We study a high-dimensional regression model. Aim is to construct a confidence 
set for a given group of regression coefficients, treating all other regression coefficients as 
nuisance parameters. We apply a one-step procedure with the square-root Lasso as initial 
estimator and a multivariate square-root Lasso for constructing a surrogate Fisher infor¬ 
mation matrix. The multivariate square-root Lasso is based on nuclear norm loss with £\- 
penalty. We show that this procedure leads to an asymptotically ^ 2 -distributed pivot, with 
a remainder term depending only on the fj-error of the initial estimator. We show that un¬ 
der /:j-sparsity conditions on the regression coefficients j3° the square-root Lasso produces 
to a consistent estimator of the noise variance and we establish sharp oracle inequalities 
which show that the remainder term is small under further sparsity conditions on j3° and 
compatibility conditions on the design. 


1 Introduction 


Let A be a given n x p input matrix and Y be a random n-vector of responses. We consider 
the high-dimensional situation where the number of variables p exceeds the number of 
observations n. The expectation of Y (assumed to exist) is denoted by /° : EY. We assume 
that X has rank n (, n < p) and let /j° be any solution of the equation Aj3° = f°. Our aim is to 
construct a confidence interval for a pre-specified group of coefficients [j { / := {/3' 1 : j £J} 
where J C {1,..../;} is a subset of the indices. In other words, the |/|-dimensional vector 
Pj is the parameter of interest and all the other coefficients fi°j := { /3' 1 : j J } are nuisance 
parameters. 


For one-dimensional parameters of interest (|/| = 1) the approach in this paper is closely 


related to earlier work. The method is introduced in 

Zhang and Zhang [2014 

. Further 

references are |Javanmard and Montanari [2013' 

and 

van de Geer et al. [2014 

. Related 

approaches can be found in Belloni et al. [2013a 

, Bel 

oni et al. [2013b| and Be 

loni et al. 


(20141 . 


For confidence sets for groups of variables (|/| > 1) one usually would like to take the 
dependence between estimators of single parameters into account. An important paper that 
carefully does this for confidence sets in (4 is Mitra and Zhang 12014). Our approach is 
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related but differs in an important way. As in |Mitra and Zhang 12014) we propose a de- 


sparsified estimator which is (potentially) asymptotically linear. However, Mitra and Zhang 


1 2014) focus at a remainder term which is small also for large groups. Our goal is rather to 
present a construction which has a small remainder term after studentizing and which does 
not rely on strong conditions on the design X. In particular we do not assume any sparsity 
conditions on the design. 


The construction involves the square-root Lasso p which is introduced by 


Belloni et al. 


12011 . See Section [ 2 ] for the definition of the estimator /j. We present a a multivariate ex 


tension of the square-root Lasso which tak es th e nuclear norm of the multivariate residuals 


as loss function. Then we define in Section 


3.1 


a de-sparsified estimator bj of /3j* which has 


the form of a one-step estimator with pj as initial estimator and with multivariate square- 
root Lasso invoked to obtain a surrogate Fisher information matrix. We show that when 
Y ~ Oq/) (with both f° and Oq unknown), a studentized version of bj — J3 j has 

asymptotically a |/|-dimensional standard normal distribution. 


More precisely we will show in Theorem[l]that for a given |/| x |/| matrix M = M- A depend¬ 
ing only on X and on a tuning parameter A, one has M^(bj — Pj)/oq = +rem 

where the remainder term “rem” can be bounded by ||rem||oo R Vra||jL / -j3“ / ||i/o b . The 
choice of the tuning parameter A is “free” (and not depending on CTo), it can can for ex¬ 
ample be taken of order \Jlogp/n. The unknown parameter <t ( j can be estimated by the 
normalized residual sum of squares cr 2 := | K — X[5 \\\/ n of the square-root Lasso (3. We 
show in Lemma[3]that under sparsity conditions on /j 0 one has cr 2 /cr ( y = 1+ op(1) and then 
in Theorem [2] an oracle inequality for the square-root Lasso under further sparsity condi¬ 
tions on P° and compatibility conditions on the design. The oracle result allows one to 
“verify” when y/nX\\P-j — j3°^||i/0o = op(l) so that the remainder term rem is negligible. 
An illustration assuming weak sparsity conditions is given in Section [5] As a consequence 


\m(bj-tf)\\i/G 2 =xUi+o P M), 


where is a random variable having a ^-distribution with |/| degrees of freedom. For 
|/| fixed one can thus construct asymptotic confidence sets for p ( / (we will also consider 
the case |/| —> °o in Section [8ji. We however do not control the size of these sets. Larger 
values for A makes the confidence sets smaller but will also give a larger remainder term. 

In Section [6] we extend the theory to structured sparsity norms other than i \, for example 
the norm used for the (square-root) group Lasso, where the demand for (^-confidence sets 
for groups comes up quite naturally. Section [8] contains a discussion. The proofs are in 
Section |3 


1.1 Notation 

The mean vector of Y is denoted by f° and the noise is £ :=Y — f°. For a vector v £ R" 
we write (with a slight abuse of notation) ||v|| 2 := v T v/n. We let <7 q :=E||e || 2 (assumed to 
exist). 

For a vector P £ HR we set Sp := {j : Pj ^ 0}. For a subset J C {1,... ,pj and a vector 
P £ M p we use the same notation Pj for the |/|-dimensional vector {pj : j £ J} and the 
/xdi men si on a I vector ( p t j := /j ; l {j £ J\ : j = 1..... p}. The last version allows us to write 
P = pj + P-j with p j = Pjc,J c being the complement of the set J. The j -th column of X 
is denoted by Xj (j = 1,...,/?). We let Xj := {Xj : j £ J} and X j := {Xj: j (f_ J). 
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For a matrix A we let ||A|| nuc iear := trac e((A r A) 1 / 2 ) be its nuclear norm. The £i-norm of the 
matrix A is defined as ||A||i := YjkHj \ a k,j\- Its £*>-norm is ||A H^o := max^maxj |ajy|. 


2 The square-root Lasso and its multivariate version 
2.1 The square-root Lasso 


The square-root Lasso (Belloni et al. 12011 ) j3 is 


J3 := arg min < ||F -Xp ||„ + Ao||/3||i 


( 1 ) 


The parameter Ao > 0 is a tuning parameter. Thus p depends on Ao but we do not express 
this in our notation. 

The square-root Lasso can be seen as a method that estimates j3° and the noise variance cr,y 
simultaneously. Defining the residuals e : = Y Xp and letting a 2 := \\s\\ 2 one clearly has 

03, a 2 ) = arg min ( +(7 + 2A 0 ||/3|| 1 j (2) 

Peslp, o 2 > o [ o j 


provided the minimum is attained at a positive value of a 2 . 


We note in passing that the square-root Lasso is not a quasi-likelihood estimator as the 
function exp [—z 2 /cr — cr], z £ R, is not a density with respect to a dominating measure not 
depending on <y 2 > 0 . The square-root Lasso is moreover not to be confused with the scaled 
Lasso. The latter is a quasi-likelihood estimator. It is studied in e.g. Sun and Zhang][ ]2012| . 

(Lemmas [2] and [3]) that for the case where e ~ ■ •¥„((). for 

‘we establish 


We show in Section 


4.1 


4.2 


example one has 6 —> Ob under l \-sparsity conditions on j3°. In Section 
oracle results for j3 under further sparsity conditions on p n and compatibility conditions on 
X (see Definition[2]for the latter). These results hold for a “universal” choice of Ao provided 
an £1 -sparsity condition on p° is met. 


In the proof of our main result in Theorem|T] the so-called Karush-Kuhn-Tucker conditions, 
or KKT-conditions, play a major role. Let us briefly discuss these here. The KKT-conditions 
for the square-root Lasso say that 


xr(Y-xjiy„ . 

O 

where £ is a p-dimensional vector with llzlU < 1 and with i,j = sign(/3 7 j if pj ^ 0. This 
follows from sub-differential calculus which defines the sub-differential of the absolute 
value function x 1—>• |^| as 


{sign(x)}{.r ^ 0} + [-1, l]{x = 0}. 


Indeed, for a fixed a > 0 the sub-differential with respect to p of the expression in curly 
brackets given in Q is equal to 


2X T (Y-XP)/n 

a 


+ 2 Xqz(P) 
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with, for 7 = 1 ,... ,p, Zj(fi) the sub-differential of fi/ i—>■ |j3/|. Setting this to zero at (j3,d) 
gives the above KKT-conditions (|5J. 


2.2 The multivariate square-root Lasso 


In our construction of confidence sets we will consider the regression of Xj on X j invoking 
a multivariate version of the square-root Lasso. To explain the latter, we use here a standard 
notation with X being the input and Y being the response. We will then replace X by X j 
and Y by Xj in Section [3T| 

The matrix X is as before an n x p input matrix and the response Y is now an nxq matrix 
for some q > 1. We define the multivariate square-root Lasso 

+ Ao||B||i| (4) 

with Ao > 0 again a tuning parameter. The minimization is over all p x q matrices B. We 
consider E := (Y —XB) T (Y —XB)/n as estimator of the noise co-variance matrix. 

The KKT-conditions for the multivariate square-root Lasso will be a major ingredient of the 
proof of the main result in Theorem [I] We present these KKT-conditions in the following 
lemma in equation |(5j. 


B := arg min ^ \\Y-XB\\ nudem . 

B 


Lemma 1 . We have 


(B,E) = arg min 

B. E>0 L 


|trace ^(Y-XB) r (Y -XB)E^^ /n 


-f traced 1 / 2 )+2Ao||5||t j 

where the minimization is over all symmetric positive definite matrix E (this being denoted 
by E > OJ and where it is assumed that the minimum is indeed attained at some E > 0. The 
multivariate Lasso satisfies the KKT-conditions 

X T (Y-XB)E~ l/2 /n = XQZ, (5) 

where Z, is a p x q matrix with ||Z||oo < 1 and with Z^j = sign {B^ J) ifBk.j ^ 0 (k = 1,... ,p, 
j=l,...,q). 


3 Confidence sets for /3 j 
3.1 The construction 

Let J C {1,....//}. We are interested in building a confidence set for ff/ := ': j £ J}. 

To this end, we compute the multivariate (|/|-dimensional) square root Lasso 

f} := arg min 


11- Xj - X-jTj 11 nuclear/ 1 /” + ^ 11 -D11 


( 6 ) 
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where X > 0 is a tuning parameter. The minimization is over all (p— |/|) x |/| matrices 7}- 
We let 

tj-={Xj-X.jfj) T Xj/n (7) 

and 

Tj := {Xj-X_jfj) T (Xj-X_jfj)/n, (8) 

We assume throughout that the “hat” matrix Tj is non-singular. The “tilde” matrix fj only 
needs to be non-singular in order that the de-sparsified estimator bj given below in Def¬ 
inition |T| is well-defined. However, for the normalized version we need not assume non¬ 
singularity of Tj. 

The KKT-conditions ([5]) appear in the form 

Xlj(Xj-X-jfj)t~ 1/2 /n = X2j, (9) 

where Z, is a(p-\J\)x |/| maUix with (Zj) kj = sign (fj) kJ if (fj) kj ± 0 and \\ZjW*, < 1. 
We define the normalization matrix 

M:=M k ■= k /nf;' /2 fj. (10) 

Definition 1. The de-sparsified estimator of pj is 

bj:=pj + T J - l (Xj-X.jfj) T (Y-Xp)/n, 

with ft the square-root Lasso given in fj the multivariate square-root Lasso given in 
^ and the matrix Tj given in 1 ( 7 }. The normalized de-sparsified estimator is Mbj with M 
the normalization matrix given in ([T()|. 


3.2 The main result 


Our main result is rather simple. It shows that using the multivariate square-root Lasso 
for de-sparsifying, and then normalizing, results in a well-scaled “asymptotic pivot” (up to 
the estimation of Co which we will do in the next section). Theorem [I] actually does not 
require /3 to be the square-root Lasso but for definiteness we have made this specific choice 
throughout the paper (except for Section[6]i. 

Theorem 1. Consider the model Y ~ ,/Z n (/°, Cq ) where f° = Xp°. Let bj be the de- 
sparsified estimator given in Definition [ 7 ] and let Mbj be its normalized version. Then 

m Q>j - Pj)/ c 0 = jV\j\ (0,7) +rem 


where ||rem||oo < y/hk\\P-j- j3° 7 ||i/c 0 . 

To make Theorem [T] work we need to bound ||/3 —/3°||i/c where c is an estimator of Co. 
This is done in Theorem[2]with c the estimator ||e||„ from the square-root Lasso. A special 
case is presented in Lemma[5]which imposes weak sparsity conditions for p°. Bounds for 
Co/c are also given. 

Theorem [T| is about the case where the noise e is i.i.d. normally disUibuted. This can be 
generalized as from the proof we see that the “main” term is linear in e. For independent 
errors with common variance c,j say, one needs to assume the Lindeberg condition for 
establishing asymptotic normality. 
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4 Theory for the square root Lasso 


Let f° :=ET, j3° be a solution of Xf3° = f° and define e :=Y — f°. Recall the square-root 
Lasso p given in (fl|. It depends on the tuning parameter Ao > 0. In this section we develop 
theoretical bounds, which are closely related to results in [Sun and Zhang | 2013| (who by 
the way use the term scaled Lasso instead of square-root Lasso in that paper). There are 
two differences. Firstly, our lower bound for the residual sum of squares of the square- 
root Lasso requires, for the case where no conditions are imposed on the compatibility 
constants, a smaller value for the tuning parameter (see Lemma [3]). These compatibility 
constants, given in Definition^ are required only later for the oracle results. Secondly, we 
establish an oracle inequality that is sharp (see Theorem[2]in Section 4.2 where we present 
more details). 


Write e :=Y —X j3 and a 2 := \\e\\ 2 . We consider bounds in terms of ||e||^, the “empirical” 
variance of the unobservable noise. This is a random quantity but under obvious conditions 
it convergences to its expectation of Another random quantity that appears in our bounds 
is e/11 e 11 „, which is a random point on the n-dimensional unit sphere. We write 


R := 


ll* r £||c 

»l|e||« 


When all Xj are normalized such that ||2f ; ||„ = 1, the quantity R is the maximal “empirical” 
correlation between noise and input variables. Under distributional assumptions R can be 
bounded with large probability by some constant R. For completeness we work out the case 
of i.i.d. normally distributed errors. 


Lemma 2. Let e ~ Xf,(0, Oq/). Suppose the normalized case where ||Xy||„ = 1 for all j = 
1 Let do, (X and a be given positive error levels such that do + (X + (X < 1 and 
log(l/a) < n/4. Define 


and 

We have 

and 




o 2 := <7 q ( 1+2 


log(l/a) 21og(l /a) 


R := 


log (2 p/ ao) 


n — 2^/nlog(l/a) 

P(||e||« < o) < a, P(||e|| n > <?) < a 
P (R >R U ||e||„ < a) < ao + a. 


4.1 Preliminary lower and upper bounds for a 2 


We now show that the estimator of the variance d~ = ||e||„, obtained by applying the 
square-root Lasso, converges to the noise variance Oq . The result holds without conditions 
on compatibility constants (given in Definition [2]). We do however need the /') -sparsity 
condition (111 on /3 q- This condition will be discussed below in an asymptotic setup. 
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Lemma 3. Suppose that for some ()</]<!, some R > 0 and some <7 > 0, we have 


1 


Ao(l — rj )>R 


and 


Ao||/3 0 || 1 /CT<2^yi + (77/2)2-l^. 


Then on the set where R < R and ||e||„ > <7 we have 


|eL/||e||n- 1 


<V- 


( 11 ) 


We remark here that the the result of Lemma [3] is also useful when using a square-root 
Lasso for constructing an asymptotic confidence interval for a single parameter, say /3j ) . 
Assuming random design it can be applied to show that without imposing compatibility 
conditions the residual variance of the square root Lasso for the regression of Xj on all 
other variables X-j does not degenerate. 

Asymptotics Suppose £\,...,£ n are i.i.d. with finite variance (Jq. Then clearly ||e||„/oo —:► 1 
in probability. The normalization in © by a - which can be taken more or less equal to 
(Tq - makes sense if we think of the standardized model 


Y=X j3° + e, 


with Y =Y /Ob. P = and £ = e/o, o. The condition (11 1 is a condition on the nor¬ 

malized j3°. The rate of growth assumed there is quite common. First of all, it is clear that 
if ||j3°j|i is very large then the estimator is not very good because of the penalty on large 
values of || • ||i. The condition (11 1 is moreover closely related to standard assumptions in 


compressed sensing. To explain this we first note that 

I|j3°||l/CT0<V^I|J3 0 ||2/(T0 

when jo is the number of non-zero entries of /3° (observe that so is a scale free property of 
J3°). The term 11/3 0 11 2 /Oo can be seen as a signal-to-noise ratio. Let us assume this signal- 
to-noise ratio stays bounded. If Ao corresponds to the standard choice Ao x y/\ogp/n the 


assumption (111 holds with 77 = o(l) as soon as we assume the standard assumption sq = 
o(n/ log pf 


4.2 An oracle inequality for the square-root Lasso 


Our next result is an oracle inequality for the square-root Lasso. It is as the corresponding 
result for the Lasso as established in |Bickel et al.|j2009| . The oracle inequality of Theorem 
[5]is sharp in the sense that there is a constant 1 in front of the approximation error X (/3 — 


J3 U )||" in (12 1 . This sharpness is obtained along the lines of arguments from 


Koltchinskii 


et al. |201l|, who prove sharp oracle inequalities for the Lasso and for matrix problems. 


We further have extended the situation in order to establish an oracle inequality for the i\ 


van de Geer 


12014] for the Lasso. 


estimation error ||/3 — j3 () ||i where we use arguments from ’ 

For the square-root Lasso, the paper |Sun and Z hang 1 2013| also has oracle inequalities, but 
these are not sharp. 


Compatibility constants are introduced in van de Geer 120071. They play a role in the 
identifiability of /3°. 


Definition 2. Let L > 0 and .S' C { 1 . The compatibility constant is 
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f(L,S) = mini \S\\ml: (IftHr = 1, ||/3- S ||i<L 


We recall the notation Sp ={./': ft ft 0} appearing in (12 1 . 
Theorem 2. Let Ao satisfy for some R > 0 


Ao(l -i?) >R 

and assume the £\-sparsity \11\ for some 0 < r; < 1 and <J > 0, i.e. 

Ao||/3°||i/a<2^1 + (r7/2) 2 -l). 

Let 0 < 5 < 1 be arbitrary and define 


A :=A 0 (l-t?)-/?, 


and 


A : — Ao(l + r]) + 7? + c5A 
L:= " 


(l-S)A' 

77;en on the set where R < R and ||£||„ > O, we have 

28 X\\$-P 0 \\i\\e\\ n + \\xtf-P°)\\ 2 n 


< min < min 

S [peRP, Sp=S 


2<5A||/3 —/3°|| 1 ||e||„ + \\X(P - p 0 )\\l 


,2 \SM\ 2 „ 

$ 2 (L,S) 


( 12 ) 


The result of Theorem [2] leads to a trade-off between the approximation error \\X([j — 
/3°)||^, the ft-error ||j3 — J3°||i and the sparsenes^j \Sp | (or rather the effective sparseness 
\Sp\/^ 2 (L,Sp)). 


5 A bound for the t \-estimation error under (weak) sparsity 


In this section we assume 

t\fl\ r <p r r, (13) 

j= 1 

where 0 < r < 1 and where p,- > 0 is a constant that is “not too large”. This is sometimes 
called weak sparsity as opposed to strong sparsity which requires ’’not too many” non¬ 
zero coefficients so := #{ft° ft 0}. We start with bounding the right hand side of the oracle 
inequality ( [12) in Theorem [2] 

We let So := Sp o be the active set Sq := {j : j3 j ft 0} of ft and let A alax (Sq) be the largest 
eigenvalue of X$ X$ 0 /n. The cardinality of So is denoted by so = |5o|. We assume in this 
section the normalization ||Z,-||„ = 1 so that that A max (S'o) > 1 and ans (j)(L,S ) < 1 for any 
L and S. 


or non-sparseness actually 
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Lemma 4. Suppose fj {> satisfies the weak sparsity condition [13\ for some 0 < r < 1 and 
Pi- > 0. For any positive 8, A, A and L 


min 


ini 


min 


s {peRp,Sp=s 

<2l 2 ~ r 


2Sm-P%\\e\\n + \\X(P-p°)\\l 

AU(So) 


$ 2 (L,S) 


5A/A 


§ 2 (lA) 


e U 


where 5* := {./': |/3j*| > A||e|| n /A max (S 0 )}. 


As a consequence, we obtain bounds for the prediction error and l \-error of the square-root 
Lasso under (weak) sparsity. We only present the bound for the t \-error as this is what we 
need in Theorem[l]for the construction of asymptotic confidence sets. 

To avoid being taken away by all the constants, we make some arbitrary choices in Lemma 
[5] we set 17 < 1/3 in the t\ -sparsity condition (111 and we set Ao(l — ij) = 2R. We choose 
5 = 1/7. 

We include the confidence statements that are given in Lemma[2]to complete the picture. 

Lemma 5. Suppose £ ~ xl/,(0, Oq I). Let (Xq and a be given positive error levels such that 
(Xo + a < 1 and log(l/a) < n/4. Define 


a 2 := <7q I 1-2 


log ( 1/00 


R := 


I log( 2 p/ QCq) 

n — 2yjn log(l /a) 


Assume the £\-sparsity condition 


R\\P°\\i/o<(\-ri)^l + (ri/2) 2 -lJ,whereO<ri< 1/3 

and the t r -sparsity condition \13\ for some 0 < r < 1 and p, > 0. Set 

S* := {j : \Pj\ > 3/?a/A max (5'o)}. 

Then for Ao(l — rj) = 2 R, with probability at least l — Uq— Ctwe have the £ r -sparsity based 
bound 


(1-77) 


<7 


< \\AA°h 

~ Il e ll« 


< (6 R) l ~ r 



6 2 A^ x (Sq) \ 

<? 2 (6A*) J 



r 


the iQ-sparsity based bound 


(l-'T?) 


<7 


< 


w-nu 

l|e||» 


< 3 R 


6 2 5q \ 

$ 2 (6,S 0 )J 


and moreover the following lower bound for the estimator <7 of the noise level: 


(l-r))ao/<7< ^!-2 


1/2 


Asymptotics Application of Theorem [l] with (7o estimated by <7 requires that v /77A||/d — 
y3° || 1 /a tends to zero in probability. Taking A x y'iog p/n and for example Oq = a = l/p, 
we see that this is the case under the conditions of Lemma[5]as soon as for some 0 < r < 1 
the following --sparsity based bound holds: 
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f ^max(ft>) \ fPrY = o{n/\ogp)^-‘ 
V0 2 (6,S*)/ \°b/ (logf>)2 

Alternatively, one may require the /^-sparsity based bound 

/ 1 \ ^ = ojn/Xogp)? 

V0 2 (6,5o)/ (log p'j 2 


6 Structured sparsity 


We will now show that the results hold for norm-penalized estimators with norms other than 
t\. Let £2 be some norm on K p_ l- / I and define for a (p — |/|) x |/| matrix A := (ai,... ,a\n) 

|/| 

IHIi.fl := Y a i )• 

j= i 

For a vector z £ M p_ 7 we define the dual norm 

£2*(z)= sup \z T a\, 
n(a)< t 

and for a (p — |/|) x |/| matrix Z= (zi,... ,zi/i) we let 

I! z lki2, = max £2 tf {z.j)- 

Thus, when £2 is the £i-norm we have ||A||i^ = ||A||i and IIZH^^ = ||Z||oo- We let the 
multivariate square-root £2 -sparse estimator be 


I}:=argrmn|||Z / -X_ / I}|| nuclear /v / n + A||r||i ! i 2 j. 

This estimator equals ([6]) when £2 is the i\ -norm. 

We let, as in ©■ © and |To| but now with the new fj, the quantities Tj, fj and M be 
defined as 

Tj--(Xj-X_jfj) T Xj/n , 


and 


Tj := (Xj-X_jfj) T (Xj-X_jfj)/r, 
M := Mx := Vnfr 1 / 2 f/. 


The fi-dc-sparsificd estimator of /jj 1 is as in Definition [I] 


bj := 13j + Tj-'iXj-X-jr/) 1 (Y —Xp)/r 


but now with 0 not necessarily the square root Lasso but a suitably chosen initial estimator 
and with f} the multivariate square-root £2 -sparse estimator. The normalized de-sparsified 
estimator is Mbj with normalization matrix M given above. We can then easily derive the 
following extension of Theorem [I] 
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Theorem 3. Consider the model Y ~ ^(/°,<Tq) where f° = X\ 3°. Let bj be the £2-de- 
sparsified estimator depending on some initial estimator j3. Let Mbj be its normalized 
version. Then 

M{bj ~ A/)/ Co = -Ytfjl (0,/) +rem 

where ||rem||oo < ^/nX£2(fi-j — P°_j)/oq. 


We see from Theorem [3] that confidence sets follow from fast rates of convergence of the 
-estimation error. The latter is studied in |Bach||201Q| , |Obozinski and Bach|l2012| and 
van de Geer| |2014] for the case where the initial estimator is the least squares est imat or 
with penalty based on a sparsity inducing norm £2 (say). Group sparsity 
12006| is an example which we shall now briefly discuss. 


Yuan and Lin 


Example 1. Let G\,...,Gj be given mutually disjoint subsets of {1,...,/?} and take as 
sparsity-inducing norm 




t =t 


The group Lasso is the minimizer of least squares loss with penalty proportional to £2. 
Oracle inequalities for the £2 -error of the group Lasso have been derived in |Lounici et al. 
[ 201 1| for example. For the square-root version we refer to Bunea et al. 120131. With group 
sparsity, it lies at hand to consider confidence sets for one of the groups G t i.e., to take 
/ = G, 0 for a given to. Choosing 


i Ao 


a G R p |c,, o 


will ensure that £2 ( jj G l0 -P-G, 0 )< £2(j5 — fj 0 ) which gives one a handle to control the 
remainder term in Theorem[3] This choice of £2 for constructing the confidence set makes 
sense if one believes that the group structure describing the relation between the response 
Y and the input X is also present in the relation between Xq { and X f ; l(j . 


7 Simulations 


Here we denote by Y = XBq + e an arbitrary linear multivariate regression. In a similar 
fashion to the suqare-root Algorithm in Buena et al. (2014) we propose the following algo¬ 
rithm for the multivariate square-root Lasso: 


Algorithm 1 msrL 

Require: Take a constant K big enough, and choose an arbitrary starting matrix fi(0) 6 
1: Y^Y/K 
2: X <— X/K 

3: for t = 0, \,2,...t stop do 

4: B(t + 1) := <J> (5(f) +X T ■ (Y — XB(t )); X ||F - XB(t)\\ nuclear ) 

5: return B(t stop + 1) 


<P(a;ri) 


0 , if a = 0 

pjl^GMh-r?)-H if a > 0 


Here we denote 
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The value t stop can be chosen in such a way that one gets the desired accuracy for the 
algorithm. This algorithm is based on a Fixpoint equation from the KKT conditions. The 
square root Lasso is calculated via the algorithm in Buena et al. (2014). 

We consider the usual linear regression model: 

Y=Xl 3+e. 

In our simulations we take a design matrix X, where the rows are fixed i.i.d. realizations 
from Af (0 ,£). We have n observations, and p explanatory variables. The covariance matrix 
L has the following toeplitz structure Zj j = 0.9 l ~i \. The errors are i.i.d. Gaussian distribu- 
tied, with variance ct 2 = 1. A set of points J is also chosen. J denotes the set of indices 
of the parameter vector /3 that we want to find asymptotic group confidence intervals for. 
Define as q - |/| the number of indices of interest. For each different setting of p and n we 
do r = 1000 simulations. In each repetition we calculate the teststatistic "/}. A significance 
level of 0.05 is chosen. The lambda of the square root LASSO X sl ± in the simulations is the 
theoretical lambda A sr u scaled by 3. For the lambda of the multivariate square root LASSO 
XmsrL we do not have theoretical results yet. That is why we use cross-validation where we 
minimize the error expressed in nuclear norm, to define Xm Sr L- It is important to note, that 
the choice of X msr L is very crucial, especially for cases where n is small. One could tune 
A ms r L in such a fashion that even cases like n = 100 work much better, see figure [2] But the 
point here is to see what happens to the chi-squared test statistic with a fixed rule for the 
choice of A msr L- This basic set up is used throughout all the simulations below. 


7.1 Asymptotic distribution 


First let us look at the question how the histogram of the teststatistic looks like for dif¬ 
ferent n. Here we use p = 500 and q = 6 with J = (1,3,4,8,10,33), where the entries in 
fjj are chosen randomly from a Uniform distribution on [1,4]. We also specify j3_y to be 
the zero vector. So the set J that we are interested in, is in fact the same set as the active 
set of (3. Furthermore p — q gives the amout of sparsity. Here we look at a sequence of 
n = 100,200,300,400,500,600,800. As above, for each setting we calculate 1000 simu¬ 
lations. For each setting we plot the histogram for the teststatistic and compare it with the 
theoretical chi-squared distribution on q = 6 degrees of freedom. Figure [T] and figure [2] 
show the results. 


The histograms show that with increasing n, we get a fast convergence to the true asymp¬ 
totic chi-squared distribution. It is in fact true that we could multiply the teststatistic with 
a constant C„ < 1 in order to get the histogram match the chi-squared distribution. This re¬ 
flects the theory. Already with n = 400 we get a very good approximation of the chi-quared 
distribution. But we see that the tuning of A msr L is crucial for small n, see figure [2] 


Next we try the same procedure but we are interested in what happens if we let J and the 
active set So not be the same set. Here we take J = (1,3,4,8,10,33) and j3o is taken from the 
uniform distribution on [1,4] on the set So = (2,3,5,8,11,12,14,31). So only the indices 
/nSo = {3,8} coincide. Figure [5] shows the results. 


Compared to the case where J and So are the same set, it seems that this setting can 
handle small n better than in the case where all the elements of J are the nonzero in¬ 
dices of /3 q. So the previous case seems to be the harder case. Therefore we stick with 


J = Sq = (1,3,4,8,10,33) for all the other simulations in Subsection 7.2 and 7.3 
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n=500,r=1000,1=1.2 


Fig. 1 Histogram of Teststatistic, 1=A, the cross-validation lambda 



0 10 20 30 40 50 60 


n=100, r=1000,1=1.2 

Fig. 2 Histogram of Teststatistic with a tuned A msr L =1=1.2 for n = 100 


7.2 Confidence level for an increasing X ms ,± 


Up until now we have not looked at the behaviour for different A. We only used the cross- 
validation A. So here we look at n =400, p = 500 and we take A msr L = (0.01,0.11,0.21,...,2. 
a fixed sequence. Figure [4] shows the results. 

If we take A too low the behaviour breaks down. On the other hand, if A is too big, we will 
not achieve a good average confidence level. The cross-validation A seems to be still a bit 
to high. So the cross-validation A could be better. 
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n=500, r=1000, I.NA 


Fig. 3 Histogram of Teststatistic, 1=A, the cross-validation lambda 


Confidence level depending on X 



n-1-i-1-r 

0 1 cv=1.4 2 3 4 

X 


Fig. 4 Average confidence level with fixed n = 400 and p = 500. increased A, 
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7.3 Levelplot for n andp 
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Not let us look at an overview of alot of different settings. We will use the levelplot to 
present the results Here we use the cross-validation A. We let n and p increase and look 
again at the average coverage of the confidence interval (average over the 1000 simulations 
for each gridpoint). The border between high and low dimensional cases is marked by the 
white line in figure [5] Increasing p does not worsen the procedure too much, which is very 
good. And, as expected, increasing the number of observations n increases ’’the accuracy” 
of the average confidence interval. 



8 Discussion 


We have presented a method for constructing confidence sets for groups of variables which 
does not impose sparsity conditions on the input matrix X. The idea is to use a loss function 
based on the nuclear norm of the matrix of residuals. We called this the multivariate square- 
root Lasso as it is an extension of the square-root Lasso in the multivariate case. 
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It is easy to see that when the groups are large, one needs the (T-norm of the remainder 
term ||rem|| 2 in Theorem [l] to be of small order \J\T\ in probability, using the represen¬ 
tation X\j\ = |T| + 0?(\/j). This leads to the requirement that y/nX\\fi^j — /3j 5 11 1 /cTq = 


o P (l/|/| 1 / 4 ), i.e., that it decreases faster for large groups. The paper Mitra and Zhang 


]2014| introduces a different scheme for confidence sets, where there is no dependence 
on group size in the remainder term after the normalization for large groups. Their idea is 
to use a group Lasso with a nuclear norm type of penalty on T} instead of the 7 1 -norm ||T}|| j 
as we do in Theorem[l] Combining the approach of Mitra and Zhang [ 2014| with the result 
of Theorem [3] leads to a new remainder term which after normalization for large groups 
does not depend on group size and does not rely on sparsity assumptions on the design X. 


The choice of the tuning parameter A for the construction used in Theorem [T] is as yet 
an open problem. When one is willing to assume certain sparsity assumptions such that a 
bound for || /3 — j3° || i is available, the tuning parameter can be chosen by trading off the size 
of the confidence set and the bias. When the rows of X are i.i.d. random variables, a choice 
for A of order f log p/n is theoretically justified under certain conditions. Finally, smaller 
A give more conservative confidence intervals. Thus, increasing A will give one a “solu¬ 
tion path” of significant variables entering and exiting, where the number of “significant” 
variables increases. If one aims at finding potentially important variables, one might want 
to choose a cut-off level here, i.e. choose A in such a way that the number of “significant” 
variables is equal to a prescribed number. However, we have as yet no theory showing such 
a data-dependent choice of A is meaningful. 


A given value for A may yield sets which do not have the approximate coverage. These 
sets can nevertheless be viewed as giving a useful importance measure for the variables, 
an importance measure which avoids the possible problems of other methods for accessing 
accuracy. For example, when applied to all variables (after grouping) the confidence sets 
clearly also give results for the possibly weak variables. This is in contrast to post-model 
selection where the variables not selected are no longer under consideration. 


9 Proofs 


9.1 Proof for the result for the multivariate square-root Lasso in 
Subsection 12.21 

Proof of Lemma[l| Let us write, for each px q matrix B. the residuals as E(B) := (Y — 
XB) t (Y —XB)/n. Let L m ; n (,B) be the minimizer of 

trace(Z(Z?)L _1//2 ) + trace (Z 1//2 ) (14) 

over X. Then E mm {B) equals E(B). To see this we invoke the reparametrization Q := L l/2 
so that Z 1/7 2 = 72 _1 . We now minimize 

trace(Z(B)f2) +trace(f2 _1 ) 

over 72 > 0. The matrix derivative with respect to 72 of trace (.E (5)72) is E(B). The matrix 
derivative of trace (72 ~ 1 ) with respect to 72 is equal to —72 2 . Hence the minimizer 72 mm (B) 
satisfies the equation 

E(B)-Q^(B)=0, 

giving 
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so that 


E min (B) = n m l(B)=E(B). 


Inserting this solution back in (14 1 gives 2trace(Z 1 / 2 (B)) which is equal to2||7—WB|| nuc i ear 
This proves the first part of the lemma. 

Let now for each E > 0. B(E) he the minimizer of 

trace(i:(B)i;' 1/2 )+2Ao|jB||i. 

By sub-differential calculus we have 

X t (Y-XB)E~ 1 / 2 /n = Ao Z{Z) 

where ||Z(Z)||oo < 1 and Z kJ (E)) = sign(B kJ (E)) if B kJ (E) ^ 0 (k = l,...,p,j=l,...q). 
The KKT-conditions ^ follow from B = B(L). □ 


9.2 Proof of the main result in Subsection 3.2 


Proof of Theorem [TJ We have 

M(bj-pj) = ff 1/2 (Xj-X-jfj) T £/y/n-ff l/2 (Xj-X-jfj) T X(P - J3 0 )/ fn 

= Tf XI \Xj-X_jfj) T elVn - fj~ 1/2 (Xj-X.jfjfxfpj - Pf)/fn 
-f J - l/2 (Xj-X.jfj) T X.j($.j - P°j)/fn 

= ff l/2 (Xj-X_jfj) T e/Vri-M(pj - Pj) - fnXzj Q3- p°j) 
where we invoked the KKT-conditions |9]). We thus arrive at 


M(bj — Pj) = Tj l/2 (X]-X~jf]) T £/fn + Oo rem, 


where 


rem = -fnXzj (p^j -p°j)/o Q . 


(15) 


The co-variance matrix of the first term f l ^ 2 (Xj — X_jfj) T £ /-fn in (15 i is equal to 


°o Tj 


-{Xj-X.jfj) T {Xj-X.jfj)f J 1/2 /n = ail 


where I is the identity matrix with dimensions |/| x |/|. It follows that this term is |/|- 
dimensional standard normal scaled with Oo- The remainder term can be bounded using the 
dual norm inequality for each entry: 

|rem ; -| < ^fnXmax\(Zj) kJ \\\p-j - P°.j\\i/a 0 < VnA||j3_/-j3° y ||i/c7o 

kfJ 


since by the KKT-conditions |9|, we have I Zj \ j «, < 1. 
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9.3 Proofs of the theoretical result for the square-root Lasso in Section [ 5 ] 


Proof of Lemma |5j Without loss of generality we can assume cr ( j = 1. From 
lMassart|j2000i we know that for all r > 0 


Laurent and 



< exp[— 1\ 


and 


’ ^ll e ll« > 1 +2^/tfn + 2t/t?j < exp[—f]. 


Apply this with t = log(l /a) andr = log(l/a) respectively. Moreover Xje/n ~ JV (0,1 /n) 
for all j. Hence for all t > 0 


\xj e\/n> y/21/n'j < 2exp[— t ], V j. 


It follows that 


P( \\X T e\\oo/n > a/ 2(f + log (2 p))/n ) < exp [-t}. 


Proof of Lemma [ 5 ] Suppose R < R and ||£||« > (7. First we note that the inequality ( |l l[ ) 
gives 

Ao||/3 0 || 1 /||e||„<2^1 + (r ] /2) 2 -l). 

For the upper bound for ||e||„ we use that 

||e||n + Ao||^||i < ||e|| n + Ao||/3°||i 
by the definition of the estimator. Hence 


|e|U<||e||„ + Ao||j3°||i< 


l+2^1 + (rj/2)2-l 


|e||« < (1 + T7)||e||„. 


For the lower bound for || e ||„ we use the convexity of both the loss function and the penalty. 
Define 

a . = nllelli. 

Tl\\e\\n + \\X(P-n\\„ 

Note that 0 < a < 1. Let jj n be the convex combination ji (/ '■= cr/3 + (1 — ce)/3°. Then 

\\X(Pa — P°)\\n = a\\X(P-P°)\\ n = ~ mln - 

ri\\£\\n + \\X(p-p°)\\ n 

Define e a := Y —Xfi a . Then, by convexity of || • ||„ and || • ||i, 

\\£a\\n + M\M \ < «||e|| n + o:Ao||j3||i + (1 - a)||e||„ + (1 - a)Ao||j3°||i 

<n e ii„+Aoinii 

where in the last step we again used that /3 minimizes ||7 — X/3||„+Ao||/3||i. Taking squares 
on both sides gives 
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||e a ||^ + 2Ao||fe||i||£a||„ + A 0 2 ||j3 a ||B|| e ||^ + 2Ao|ri| 1 ||e||n + ^ir||^ (16) 


2II o 0||2 


But 

ll^ce lire = ll e lln — 2e r X(^ a —/3 0 )/n + ||Z(^ a —/3°)|| 2 
>|| e || 2 -2/?||fe-^|| 1 || e || n + ||X03 a -/3°)|| 2 
> II £ IIk — 2^11^0! ||! ||e||„ — 2^||^°|| 1 ||e||„ + ||2f(^ a — ^°)|| 2 . 

Moreover, by the triangle inequality 

||e«||„ > ||e|U-||^03«-/3°)|U > (1 - 77 )||e||„. 

Inserting these two inequalities into ( p~6] > gives 

ll e lln — 2/?||^o! ||t Hellt — 2/?||/3°|| 1 ||e|| n + ||X(^ a —/3°)|| 2 +2Ao(l —t|) ||^ a || rllelln-i-Ao||^a ||i 
<|| e || 2 + 2Ao||/3 0 || 1 || e || n + A 0 2 |ni? 

which implies by the assumption Ao(l — Tj) > R 

\\X{p a —/3 °)|| 2 < 2(Ao+/?)||/3°|| 1 ||e||i +Aq ||/3°|| 2 . 
<4Ao|j/3>|| e || 1+ A 2 ||j3 0 || 2 

where in the last inequality we used R < (1 — T])Aq < Ao. But continuing we see that we 
can write the last expression as 

4Ao||^°|| 1 ||e||i + Ao ||/3°|| 2 = ^(Ao||^o||t/||e„||„ +2) 2 — 4^ ||e|| 2 . 

Again invoke the l \-sparsity condition 


to get 


Ao|^ 0 ||i/||e||„< 2^1+ (t7/2) 2 -l 


((Ao||/3 0 || 1 /||en||n + 2) 2 -4) ||e|| 2 < ^-||e|| 2 . 


We thus established that 


Rewrite this to 


Omi / »ll|e ||n 


\\x(Pa-n\u< 


77iig|i»ii^(/3-/3°)iu 
1 !\\ £ \\n + \\X(P-n\\ n ~ 2 ’ 


and rewrite this in turn to 

77i| £ y|x(j3-ni| n < 


<K„ ^ r / 2 ll e lln , t7||e|U|X(/3-/3°)||„ 


or 


P03-j3 o )|| n <ri||e||„. 

But then, by repeating the argument, also 

||£||„>|| e || n -||Z03-/3 o )|| n >(l-r7)|| e || n . 
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Proof of Theorem[ 2 ] Throughout the proof we suppose R < R and ||e|| n > <7. Define the 
Gram matrix E := X^X/n. Let /3 € W 0 and S := Sp = {j : ^ 0}. If 

(P-P) T £(P-P°)<-8k\\P-P\\lMn 

we find 

25A||jMlli||e||» + ll*0M°)lG 

= 25A||j3-/3|| 1 || e || n + ||Z(/3-/3 0 )||2-||X(/3-j3)||2+2(j3-^fX(j3-/3°) 

<\\x(p-p°)\\l 

So then we are done. 

Suppose now that 

(j8-/3) r £(j8-j8°) > -SAIIyS-jSIUIIelU. 

By the KKT-conditions <J3]) 

03-/3) r £(y3-i3 o ) + Ao|U3|| 1 ||e||„ <e 7 ’^(j8-/3)/n + Ao||/3|| 1 ||e|| ll . 

By the dual norm inequality and since R < R 

\ £T X(fi - fi)\/n < R\\p — /311r 11£11„. 

Thus 

(j3-j8) 7 ’£03-/3°)+A o ||j8||i||e|U ^^Ilyg-jSIUHelU + AolIjSllillglU. 

This implies by the triangle inequality 

03-j8°) + (Ao||e||„-/?||e|U)||j8_ 5 ||i < (Ao||e||„+/?||e||„)||j8 s - j81|x. 

We invoke the result of Lemma [ 3 ] which says that that (1 — 77 ) ||e ||n < ||e||n < (1 + T 7)ll e ll«- 
This gives 

(P-P) T E(P-P°)+MP. S \\l\\e\\n<(Hi+n)+m^-m £ \\n. (17) 

Since (/3 —/3) r Z(j3 —j3°) > —5A||e||„||j3 — /3||i this gives 

(l-5)A|!j3_ s || 1 || e ||„<(Ao(l + T?)+/? + 5A)||j3 s -^l! 1 ||e||n = A||j3 s -^|| 1 || e ||„. 


But then 


\\H-s\\i<L\\Ps-Hh. 

\\&s-P\\i<V\S\\\X0-P)\\n/${L,S). 


(18) 


Continue with inequality (17 1 and apply the inequality ab < (a 2 + b 2 )/2 which holds for 
all real valued a and b: 


tf-P)ttf-p°)+M$- s \\M\« + m\&s-PhMn 


<m\ n V\s\\\x(P-P)\\ n /HL,s) 


ij 2 M 

■ 2 <P(L,S) 


+ ±\\X(P-P)\\l 


2Q3 -/3) r Z(j3 -/3°) = \\X(P-P°)\\ 2 n -\\X(I3-I3 0 )\\t + \\X(P--I3)\\ 2 n , 


Since 
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we obtain 

\\X($~p Q )\\l + 2m-s\U\\e\\„ + 2dms-me\\n 

<\\X(l5~^)\\l + l 2 \S\\\e\\ 2 jf-{L,S). 

□ 


9.4 Proofs of the illustration assuming (weak) sparsity in Section |5] 


Proof of Lemma [ 4 ] Define A* := A |e|| H /A max (.S'oj and for j = 1,... ,p. 


Pj = PjH\Pj\ > A*}- 


Then 


ll5<^(So)||j3*-j3 0 ||I<A 


2 

max 


M*-/3 C 

= X 2 - r AL*(S 0 )p r r \\e\\ 2 - r < X 2 - r AU(S 0 )p r r \\e\ 


(So)ti~ r Pr 
l~ r / $ 2 {l,s>) 


where in the last inequality we used 0(L,S*) < 1. Moreover, noting that Sp* = S* 
\Pj\ > A*} we get 

\Sp t \<X- r p r r =l- r \\e\\- r A^(S 0 ). 


li ■ 


Thus 

Moreover 


k 2 \Sp,\\\e\\l/^ 2 (L 1 SpA<r--’A Lx{ s Q )p r r 


7 


\\P* ~Po\\l < K~ r pr = k 


' 1—r li 


1 1—r 




since <p(L,S*)/A max (S 0 ) < T 


□ 


Proof of Lemma |5] The l\ -sparsity condition © holds with 77 < 1/3. Theorem [ 2 ] with 
Ao(l-rj)=2 R gives A =Ao(l-rj)-/?=i? and 37?< A =Ao(l + rj)+/? + c>A < (5 + 8)R. 
We take 5 = 1/7. Then L = A/((l — 5)A) < (5 + 5)/(l - 5) =6. Set 5* := { j : \[lf > 
A||e|| n /A max (S 0 )}. On the set where ||e||„ > ff we have 5* C S t since A > 3 R. We also 
have A/(5A) < 6 2 . Hence, using the arguments of Lemma[4]and the result of Theorem]^] 
we get on the set R < R and ||e||„ > a;, 


lli3-/3 Q ||i 51 -r( } , 6 2 A- ax (5o) \ (_pA\ r 

ll e ll« - V <? 2 ( 6 ,S*) Allell n) ■ 


Again, we can bound here l/||e||„ by 1 /a. We can moreover bound A by 6R. Next we see 
that on the set where R <R and ||e|| n > cr, by Lemma [ 3 ] 

o- > (1 — 7?)||e||„ > (1 — 77 )^. 

The fo-bound follows in the same way, inserting j> = /3° in Theorem [l] Invoke Lemma [ 2 ] 
to show that the set {R < R n ||e|| n > a} has probability at least 1 — ao — a. □ 


9.5 Proof of the extension to structured sparsity in Section [6] 

Proof of Theorem [3] This follows from exactly the same arguments as used in the proof 
of Theorem [l] as the KKT-conditions with general norm £2 imply that 
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|| x'LjlXj-X-jfritj l/2 /n\\oo,n, < A. 


□ 
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